Reprogrammable redundancy for cache Vmin reduction in a 28nm RISC-V processor

نویسندگان

  • Brian Zimmer
  • Pi-Feng Chiu
  • Borivoje Nikolic
  • Krste Asanovic
چکیده

The presented processor lowers SRAM-based cache Vmin by using three architectural techniques–bit bypass (BB), dynamic column redundancy (DCR), and line disable (LD)–that use low-overhead reprogrammable redundancy (RR) to avoid failing bitcells and therefore increase the maximum bitcell failure rate in processor caches. In the 28nm chip, the Vmin of the 1MB L2 cache is reduced by 25%, resulting in a 49% power reduction with a 2% area overhead and minimal timing overhead. Introduction Lowering the minimum operating voltage of SRAM-based caches (Vmin) improves the energy efficiency of digital systems. A wide variety of circuit-level assist techniques have been proposed to reduce Vmin by reducing bitcell failure rate [1], but require redevelopment for each new technology and present high area and power overhead. Alternatively, architecture-level techniques increase the allowable failure rate by tolerating failing bitcells, as shown in Figure 1, and can either supplant or supplement existing assist schemes. Parametric variations cause a wide spread of minimum operating voltages for bitcells in a chip. The increase in bitcell error rate (BER) due to a decrease in voltage, referred to here as the failure slope, is dictated by the process technology, SRAM bitcell, and SRAM periphery architecture. The 28nm SRAM used in this work has a measured failure slope of around 50mV per decade—a 50mV reduction in VDD increases the number of failures by ten times. Resiliency techniques are evaluated based on their ability to increase the maximum allowable BER, and BER is translated into voltage reduction based on the failure slope. A gradual failure slope improves the effectiveness of architecture-level techniques, as the same BER difference translates to a larger voltage difference. To have adequate yield for large SRAM-based caches, the bitcell error rate must be extremely low—about 1×10−10 for a 1MB cache. A resiliency modeling framework translates the probability of bitcell failure to cache yield for a variety of architecture-level resiliency techniques to evaluate the relative effectiveness. Lightweight resiliency techniques, such as static redundancy [2], achieve significant Vmin reduction at low cost by tolerating failures in a very small number of cells. However, static column and row redundancy can only cope with a limited number of failures as the overhead of the circuitry required scales poorly with increased protection. More aggressive techniques, such as line disable [3] or error correcting codes (ECC) [4], can tolerate more failing cells and therefore a higher BER than static redundancy, but still have limited effectiveness. The maximum Vmin reduction allowed by line disable is limited by diminished cache capacity at high failure rates (as an entire line needs to be disabled to repair a single bit), and ECC effectiveness is limited by uncorrectable double-bit errors. At high voltages, orders of magnitude changes in the failure rate translate to only small changes in number of absolute failing bitcells, but at low voltages, the number of failing cells increases dramatically and trading off increased faults for decreased VDD becomes much less attractive. The limits of fault avoidance for Vmin reduction have been explored by aggressive techniques [5], [6], and the lack of known silicon 1 0.4 0.5 0.6 0.7 0.8 0.9 100 10-12 10-10 10-8 10-6 10-4 10-2 Vdd Bi t E rro r R at e (B ER ) Vmin Prevent errors (circuit-level) Tolerate errors (architecture-level) Allowable failure rate Bitcell failure rate Tote+Prevent Prevent Tlerate Fig. 1: Circuit-level techniques lower Vmin by improving bitcells while architecture-level techniques lower Vmin by tolerating bitcell failures. implementations of these ideas reflect the high overhead costs and implementation complexity required to tolerate such high failure rates. This work proposes a new architecture-level redundancy technique, called dynamic column redundancy (DCR) that targets a sweet spot failure rate of 1×10−4, and achieves a lower Vmin than other proposed techniques (such as static redundancy, ECC, and line disable) with low area, delay, and energy overhead. Vmin is reduced further by supplementing DCR with line disable (LD) to tolerate multi-bit failures, and another reprogrammable redundancy technique, bit bypass, to protect against failures in the tag arrays without requiring SRAM assist techniques for the tag macros. The proposed techniques have a low enough overhead to be used in both the L1 and L2 cache, in comparison to many prior techniques that target L2 caches only. These techniques are verified through implementation in a 28nm RISC-V [7] processor where the proposed RR techniques enable a 25% Vmin reduction with 2% area overhead. Reprogrammable Redundancy Implementation Figure 2 shows the system diagram of the implemented processor with reprogrammable redundancy. The processor is based on a 64-bit RISC-V single-issue in-order 6-stage pipeline [8]. To support DVFS, the processor is split into three independent voltage and frequency domains: the processor pipeline with L1 cache, the L2 cache, and the uncore I/O domain. The L1 consists of an 8KB instruction and 16KB data cache with 8T-based macros. The 1MB, 4-bank, L2 cache uses high-density 6T-based macros. The three architecture-level RR techniques protect all SRAM bitcells on the chip: BB protects tag arrays, and DCR and LD protect the data portions of both the L1 and L2 cache. An at-speed SRAM built-in-self-test (BIST) quickly identifies fault locations. Asynchronous FIFOs and level shifters allow communication between the voltage and frequency islands. Every SRAM includes single error correction and double error detection (SECDED) protection. While testing RR, the correction capability is disabled and errors are simply logged to ensure that all SRAM faults are identified. SECDED correction can be enabled to protect against soft errors, or protect against intermittent errors by reprogramming the redundant entries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximizing Energy-Efficiency through Joint Optimization of L1 Write Policy, SRAM Design, and Error Protection

L1 cache design contributes significantly to the performance and energy consumption of microprocessors due to L1’s large proportion of die size and high activity factor. Voltage reduction reduces energy per operation, however, increased process variability in modern technology nodes causes an exponential growth in SRAM failure probability for linear voltage reduction, necessitating some form of...

متن کامل

Improving Energy Efficiency and Reducing Code Size with RISC-V Compressed

Delivering the instruction stream can be the largest source of energy consumption in a processor, yet loosely-encoded RISC instruction sets are wasteful of instruction bandwidth. Aiming to improve the performance and energy efficiency of the RISC-V ISA, this thesis proposes RISC-V Compressed (RVC), a variable-length instruction set extension. RVC is a superset of the RISC-V ISA, encoding the mo...

متن کامل

Processor Design for Digital Flight Control Computer

This paper presents the design and FPGA implementation of a 32bit configurable micro controller. The micro controller contains a 32-bit processor based on RISC-V Instruction Set Architecture, Cache memories, interrupt support, multiplexed buses and a Debug Unit. The processor support all integer arithmetic. Cache memories have various sizes upto 16kB. Prioritized stacked interrupt control is pr...

متن کامل

Improving Energy Efficiency and Reducing Code Size with RISC

Delivering the instruction stream can be the largest source of energy consumption in a processor, yet loosely-encoded RISC instruction sets are wasteful of instruction bandwidth. Aiming to improve the performance and energy efficiency of the RISC-V ISA, this thesis proposes RISC-V Compressed (RVC), a variable-length instruction set extension. RVC is a superset of the RISC-V ISA, encoding the mo...

متن کامل

Performance and power effectiveness in embedded processors customizable partitioned caches

This paper explores an application-specific customization technique for the data cache, one of the foremost area/power consuming and performance determining microarchitectural features of modern embedded processors. The automated methodology for customizing the processor microarchitecture that we propose results in increased performance, reduced power consumption and improved determinism of cri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016